I'm developing a C++ Date Class (GED-DATE) in aid of parsing GEDCOM files. What I need help on is collecting as many examples as possible from the 'wild'. This may include non-standard forms as well _DATE comes to mind). What I want to be able to parse what is out there (thanks Mulder). As an example in my dev directory I have:

  1. allged.ged
  2. example.ged
  3. myers.ged
  4. simple.ged
  5. torture.ged

For a total of 54358 lines of GEDCOM data and 5969 lines of DATE information. Hardly a sufficient sample, but a start.

I gathered my test file with a command line of:

C:\grep DATE *.ged > dates.txt

If those of you willing to help out would do the same for as many .ged files as you might have and either link or post the results in this discussion I would greatly appreciate it. In addition if you know of any links to additional .ged files that are available to the public links to those would be very helpful.

You might ask of what use would this be to BetterGedcom? Without getting into (hopefully) any politics it is my feeling that BG will need to handle legacy DATE forms in any attempt to move ahead. This would certainly include the material I'm looking for. A second consideration is that this will be open source and possibly of use to BG when it comes time to cut code. My plan is to be able to parse and convert any DATE form excepting DATE_PHRASE of course, and provide a date arithmetic (ranges, age, etc.) as well as a variety of output formats. I believe I'm not the only one who stared at +1 DATE PLUV 0012 without knowing that it really (Gregorian bias here...) was equivalent to +1 DATE FROM 20 JAN TO 18 FEB 1804 (I think I got that right :) )?

Also for those interested in the 'DATE' problem, it would be quite useful to me to hear what people might like in such a creation or any other discussion on the subject.

hsmyers
(hsmyers atsign gmail dot com)
usual sig='s --hsm

Comments

Andy_Hatchett 2013-05-29T17:35:31-07:00
Discussions
Discussions have not been turned off. Just click the double bubble next to Edit.
louiskessler 2013-05-29T18:51:57-07:00
Thanks, Andy. I gues the "new" format of wikispaces threw me off.
Andy_Hatchett 2013-05-29T17:44:55-07:00
Large GEDCOM
If you want a *very* large GEDCOM for testing, look for the stobie.ged on rootsweb. It contains over 400,000+ people.
hsmyers 2013-05-31T08:42:35-07:00
I will most certainly do that---400k x say 2 DATEs per equals 800k candidates for testing. When I'm not messing with GEDCOM libraries, I write Perl code to parse chess games and my regression test for move parsing is about 1.5million so this plus the other site posted is very probably going to put me way above that!
louiskessler 2013-05-29T18:51:04-07:00
GEDCOMs via Google
Try: https://www.google.com/search?q=%220+head%22+filetype%3Aged which lists 23,600 GEDCOM files.

Hopefully that will be enough for you. :-)

Louis
hsmyers 2013-05-31T08:39:16-07:00
Yes I believe that just might be the ticket---thankfully I've a small herd of terabyte drives :)